Incremental re-training of a hybrid English-French MT system with Customer Translation Memory data
نویسنده
چکیده
In this paper, we present SAIC’s hybrid machine translation (MT) system and show how it was adapted to the needs of our customer – a major global fashion company. The adaptation was performed in two ways: off-line selection of domain-relevant parallel and monolingual data from a background database, as well as on-line incremental adaptation with customer parallel and translation memory data. The translation memory was integrated into the statistical search using two novel features. We show that these features can be used to produce nearly perfect translations of data that fully or to a large extent partially matches the TM entries, without sacrificing on the translation quality of the data without TM matches. We also describe how the human post-editing effort was reduced due to significantly better MT quality after adaptation, but also due to improved formatting and readability of the MT output.
منابع مشابه
Omnifluent English-to-French and Russian-to-English Systems for the 2013 Workshop on Statistical Machine Translation
This paper describes OmnifluentTM Translate – a state-of-the-art hybrid MT system capable of high-quality, high-speed translations of text and speech. The system participated in the English-to-French and Russian-to-English WMT evaluation tasks with competitive results. The features which contributed the most to high translation quality were training data sub-sampling methods, document-specific ...
متن کاملOmnifluentTM English-to-French and Russian-to-English Systems for the 2013 Workshop on Statistical Machine Translation
This paper describes OmnifluentTM Translate – a state-of-the-art hybrid MT system capable of high-quality, high-speed translations of text and speech. The system participated in the English-to-French and Russian-to-English WMT evaluation tasks with competitive results. The features which contributed the most to high translation quality were training data sub-sampling methods, document-specific ...
متن کاملCMU Syntax-Based Machine Translation at WMT 2011
We present the Carnegie Mellon University Stat-XFER group submission to the WMT 2011 shared translation task. We built a hybrid syntactic MT system for French–English using the Joshua decoder and an automatically acquired SCFG. New work for this year includes training data selection and grammar filtering. Expanded training data selection significantly increased translation scores and lowered OO...
متن کاملMoses SMT
SAP has been heavily involved in the implementation and deployment of machine translation (MT) within the company since the early 1990s. In 2013, SAP initiated an extensive proof of concept project, based on the statistical MT system Moses (Koehn, et al., 2007), in collaboration with the external implementation partner CrossLang. The project focused on the use of Moses SMT as an aid to translat...
متن کاملMSR-MT: The Microsoft Research Machine Translation System
MSR-MT is a data-driven MT system that combines rule-based and statistical techniques with example-based transfer. This hybrid, large-scale system is capable of learning all its knowledge of lexical and phrasal translations directly from data. MSR-MT has undergone rigorous evaluation showing that, trained on a corpus of technical data similar to the test corpus, its output surpasses the quality...
متن کامل